split mnist
15825aee15eb335cc13f9b559f166ee8-AuthorFeedback.pdf
We are not certain we understood this criticism correctly. We use a diversity penalty (L113-115) in Generative MIR. In ER-MIR, diversity is enforced via sampling prior to applying the criterion (L102-104). We now extend our ER-MIR experiments to Mini-ImageNet split. Over 20 runs we obtain an accuracy of 26.4% We emphasize our work's aim was to determine if the In terms of memory consumption it is the same as ER with equivalent buffer.
Task-Focused Consolidation with Spaced Recall: Making Neural Networks Learn like College Students
Deep neural networks often suffer from a critical limitation known as catastrophic forgetting, where performance on past tasks degrades after learning new ones. This paper introduces a novel continual learning approach inspired by human learning strategies like Active Recall, Deliberate Practice, and Spaced Repetition, named Task-Focused Consolidation with Spaced Recall (TFC-SR). TFC-SR enhances the standard experience replay framework with a mechanism we term the Active Recall Probe. It is a periodic, task-aware evaluation of the model's memory that stabilizes the representations of past knowledge. We test TFC-SR on the Split MNIST and the Split CIFAR-100 benchmarks against leading regularization-based and replay-based baselines. Our results show that TFC-SR performs significantly better than these methods. For instance, on the Split CIFAR-100, it achieves a final accuracy of 13.17% compared to Standard Experience Replay's 7.40%. We demonstrate that this advantage comes from the stabilizing effect of the probe itself, and not from the difference in replay volume. Additionally, we analyze the trade-off between memory size and performance and show that while TFC-SR performs better in memory-constrained environments, higher replay volume is still more effective when available memory is abundant. We conclude that TFC-SR is a robust and efficient approach, highlighting the importance of integrating active memory retrieval mechanisms into continual learning systems.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom (0.04)
Sequential Function-Space Variational Inference via Gaussian Mixture Approximation
Zhu, Menghao Waiyan William, Hao, Pengcheng, Kuruoğlu, Ercan Engin
Continual learning is learning from a sequence of tasks with the aim of learning new tasks without forgetting old tasks. Sequential function-space variational inference (SFSVI) is a continual learning method based on variational inference which uses a Gaussian variational distribution to approximate the distribution of the outputs of a finite number of selected inducing points. Since the posterior distribution of a neural network is multi-modal, a Gaussian distribution could only match one mode of the posterior distribution, and a Gaussian mixture distribution could be used to better approximate the posterior distribution. We propose an SFSVI method which uses a Gaussian mixture variational distribution. We also compare different types of variational inference methods with and without a fixed pre-trained feature extractor. We find that in terms of final average accuracy, Gaussian mixture methods perform better than Gaussian methods and likelihood-focused methods perform better than prior-focused methods.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States (0.04)
- Europe > France (0.04)
- Europe > Austria > Vienna (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
On the Computation of the Fisher Information in Continual Learning
Continual learning is a rapidly growing subfield of deep learning devoted to enabling neural networks to incrementally learn new tasks, domains or classes while not forgetting previously learned ones. Such continual learning is crucial for addressing real-world problems where data are constantly changing, such as in healthcare, autonomous driving or robotics. Unfortunately, continual learning is challenging for deep neural networks, mainly due to their tendency to forget previously acquired skills when learning something new. Elastic Weight Consolidation (EWC) [1], developed by Kirkpatrick and colleagues from DeepMind, is one of the most popular methods for continual learning with deep neural networks. To this day, this method is featured as a baseline in a large proportion of continual learning studies. However, in the original paper the exact implementation of EWC was not well described, and no official code was provided. A previous blog post by Huszár [2] already addressed an issue relating to how EWC should behave when there are more than two tasks.
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Asia > China (0.04)
- Health & Medicine (0.48)
- Education (0.47)
- Information Technology (0.34)
Task agnostic continual learning with Pairwise layer architecture
Most of the dominant approaches to continual learning are based on either memory replay, parameter isolation, or regularization techniques that require task boundaries to calculate task statistics. We propose a static architecture-based method that doesn't use any of these. We show that we can improve the continual learning performance by replacing the final layer of our networks with our pairwise interaction layer. The pairwise interaction layer uses sparse representations from a Winner-take-all style activation function to find the relevant correlations in the hidden layer representations. The networks using this architecture show competitive performance in MNIST and FashionMNIST-based continual image classification experiments. We demonstrate this in an online streaming continual learning setup where the learning system cannot access task labels or boundaries.
Learn the Time to Learn: Replay Scheduling in Continual Learning
Klasson, Marcus, Kjellström, Hedvig, Zhang, Cheng
Replay methods are known to be successful at mitigating catastrophic forgetting in continual learning scenarios despite having limited access to historical data. However, storing historical data is cheap in many real-world settings, yet replaying all historical data is often prohibited due to processing time constraints. In such settings, we propose that continual learning systems should learn the time to learn and schedule which tasks to replay at different time steps. We first demonstrate the benefits of our proposal by using Monte Carlo tree search to find a proper replay schedule, and show that the found replay schedules can outperform fixed scheduling policies when combined with various replay methods in different continual learning settings. Additionally, we propose a framework for learning replay scheduling policies with reinforcement learning. We show that the learned policies can generalize better in new continual learning scenarios compared to equally replaying all seen tasks, without added computational cost. Our study reveals the importance of learning the time to learn in continual learning, which brings current research closer to real-world needs.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.94)
- Health & Medicine (0.92)
- Education > Educational Setting > Online (0.45)
TAME: Task Agnostic Continual Learning using Multiple Experts
Zhu, Haoran, Majzoubi, Maryam, Jain, Arihant, Choromanska, Anna
The goal of lifelong learning is to continuously learn from non-stationary distributions, where the non-stationarity is typically imposed by a sequence of distinct tasks. Prior works have mostly considered idealistic settings, where the identity of tasks is known at least at training. In this paper we focus on a fundamentally harder, so-called task-agnostic setting where the task identities are not known and the learning machine needs to infer them from the observations. Our algorithm, which we call TAME (Task-Agnostic continual learning using Multiple Experts), automatically detects the shift in data distributions and switches between task expert networks in an online manner. At training, the strategy for switching between tasks hinges on an extremely simple observation that for each new coming task there occurs a statistically-significant deviation in the value of the loss function that marks the onset of this new task. At inference, the switching between experts is governed by the selector network that forwards the test sample to its relevant expert network. The selector network is trained on a small subset of data drawn uniformly at random. We control the growth of the task expert networks as well as selector network by employing online pruning. Our experimental results show the efficacy of our approach on benchmark continual learning data sets, outperforming the previous task-agnostic methods and even the techniques that admit task identities at both training and testing, while at the same time using a comparable model size.
Learning an evolved mixture model for task-free continual learning
Recently, continual learning (CL) has gained significant interest because it enables deep learning models to acquire new knowledge without forgetting previously learnt information. However, most existing works require knowing the task identities and boundaries, which is not realistic in a real context. In this paper, we address a more challenging and realistic setting in CL, namely the Task-Free Continual Learning (TFCL) in which a model is trained on non-stationary data streams with no explicit task information. To address TFCL, we introduce an evolved mixture model whose network architecture is dynamically expanded to adapt to the data distribution shift. We implement this expansion mechanism by evaluating the probability distance between the knowledge stored in each mixture model component and the current memory buffer using the Hilbert Schmidt Independence Criterion (HSIC). We further introduce two simple dropout mechanisms to selectively remove stored examples in order to avoid memory overload while preserving memory diversity. Empirical results demonstrate that the proposed approach achieves excellent performance.
Target Layer Regularization for Continual Learning Using Cramer-Wold Generator
Mazur, Marcin, Pustelnik, Łukasz, Knop, Szymon, Pagacz, Patryk, Spurek, Przemysław
The concept of continual learning (CL), which aims to reduce the distance between human and artificial intelligence, seems to be considered recently by deep learning community as one of the main challenges. Generally speaking, it means the ability of the neural network to effectively learn consecutive tasks (in either supervised or unsupervised scenarios) while trying to prevent forgetting already learned information. Therefore, when designing an appropriate strategy, it needs to be ensured that the network weights are updated in such a way that they correspond to both the current and all previous tasks. However, in practice, it is quite likely that constructed CL model will suffer from either intransigence (hard acquiring new knowledge, see Chaudhry et al. [2018]) or catastrophic forgetting (CF) phenomenon (tendency to lose past knowledge, see McCloskey and Cohen [1989]). In recent years, methods of overcoming the above-mentioned problems are subject to wide and intensive investigation.
- North America > United States (0.14)
- Europe > Poland > Lesser Poland Province > Kraków (0.06)